low-rank matrix recovery
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
How many samples is a good initial point worth in Low-rank Matrix Recovery?
Given a sufficiently large amount of labeled data, the nonconvex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp "threshold" number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minima. Optimizing the threshold over regions of the landscape, we see that, for initial points not too close to the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.
Understanding the Implicit Regularization of Gradient Descent in Over-parameterized Models
Ma, Jianhao, Liang, Geyu, Fattahi, Salar
Implicit regularization refers to the tendency of local search algorithms to converge to low-dimensional solutions, even when such structures are not explicitly enforced. Despite its ubiquity, the mechanism underlying this behavior remains poorly understood, particularly in over-parameterized settings. We analyze gradient descent dynamics and identify three conditions under which it converges to second-order stationary points within an implicit low-dimensional region: (i) suitable initialization, (ii) efficient escape from saddle points, and (iii) sustained proximity to the region. We show that these can be achieved through infinitesimal perturbations and a small deviation rate. Building on this, we introduce Infinitesimally Perturbed Gradient Descent (IPGD), which satisfies these conditions under mild assumptions. We provide theoretical guarantees for IPGD in over-parameterized matrix sensing and empirical evidence of its broader applicability.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Normalized Iterative Hard Thresholding for Tensor Recovery
Li, Li, Liang, Yuneng, Zheng, Kaijie, Lu, Jian
Low-rank recovery builds upon ideas from the theory of compressive sensing, which predicts that sparse signals can be accurately reconstructed from incomplete measurements. Iterative thresholding-type algorithms-particularly the normalized iterative hard thresholding (NIHT) method-have been widely used in compressed sensing (CS) and applied to matrix recovery tasks. In this paper, we propose a tensor extension of NIHT, referred to as TNIHT, for the recovery of low-rank tensors under two widely used tensor decomposition models. This extension enables the effective reconstruction of high-order low-rank tensors from a limited number of linear measurements by leveraging the inherent low-dimensional structure of multi-way data. Specifically, we consider both the CANDECOMP/PARAFAC (CP) rank and the Tucker rank to characterize tensor low-rankness within the TNIHT framework. At the same time, we establish a convergence theorem for the proposed TNIHT method under the tensor restricted isometry property (TRIP), providing theoretical support for its recovery guarantees. Finally, we evaluate the performance of TNIHT through numerical experiments on synthetic, image, and video data, and compare it with several state-of-the-art algorithms.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
Review for NeurIPS paper: How many samples is a good initial point worth in Low-rank Matrix Recovery?
Weaknesses: (1) More information could be added to the introduction of the matrix sensing problem. For example, are A_1…A_m known? Are they fixed or random? What are applications of matrix sensing and maybe what is its relationship to machine learning? Such information perhaps makes this topic clearer and more motivating, especially to the people who touch this topic for the first time.
Guarantees of a Preconditioned Subgradient Algorithm for Overparameterized Asymmetric Low-rank Matrix Recovery
Giampouras, Paris, Cai, HanQin, Vidal, Rene
In this paper, we focus on a matrix factorization-based approach for robust low-rank and asymmetric matrix recovery from corrupted measurements. We address the challenging scenario where the rank of the sought matrix is unknown and employ an overparameterized approach using the variational form of the nuclear norm as a regularizer. We propose a subgradient algorithm that inherits the merits of preconditioned algorithms, whose rate of convergence does not depend on the condition number of the sought matrix, and addresses their current limitation, i.e., the lack of convergence guarantees in the case of asymmetric matrices with unknown rank. In this setting, we provide, for the first time in the literature, linear convergence guarantees for the derived overparameterized preconditioned subgradient algorithm in the presence of gross corruptions. Additionally, by applying our approach to matrix sensing, we highlight its merits when the measurement operator satisfies the mixed-norm restricted isometry properties. Lastly, we present numerical experiments that validate our theoretical results and demonstrate the effectiveness of our approach.
How many samples is a good initial point worth in Low-rank Matrix Recovery?
Given a sufficiently large amount of labeled data, the nonconvex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp "threshold" number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minima. Optimizing the threshold over regions of the landscape, we see that, for initial points not too close to the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.
Universal low rank matrix recovery from Pauli measurements
We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefficients.
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)